Finding recurrent out-of-vocabulary words

نویسندگان

  • Long Qin
  • Alexander I. Rudnicky
چکیده

Out-of-vocabulary (OOV) words can appear more than once in a conversation or over a period of time. Such multiple instances of the same OOV word provide valuable information for estimating the pronunciation or the part-of-speech (POS) tag of the word. But in a conventional OOV word detection system, each OOV word is recognized and treated individually. We therefore investigated how to identify recurrent OOV words in speech recognition. Specifically, we propose to cluster multiple instances of the same OOV word using a bottom-up approach. Phonetic, acoustic and contextual features were collected to measure the distance between OOV candidates. The experimental results show that the bottom-up clustering approach is very effective at detecting the recurrence of OOV words. We also found that the phonetic feature is better than the acoustic and contextual features, and the best performance is achieved when combining all features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recurrent Out-of-Vocabulary Word Detection Using Distribution of Features

The repeated use of out-of-vocabulary (OOV) words in a spoken document seriously degrades a speech recognizer’s performance. This paper provides a novel method for accurately detecting such recurrent OOV words. Standard OOV word detection methods classify each word segment into in-vocabulary (IV) or OOV. This word-by-word classification tends to be affected by sudden vocal irregularities in spo...

متن کامل

Class-Based N-Gram Language Model for New Words Using Out-of-Vocabulary to In-Vocabulary Similarity

Out-of-vocabulary (OOV) words create serious problems for automatic speech recognition (ASR) systems. Not only are they missrecognized as in-vocabulary (IV) words with similar phonetics, but the error also causes further errors in nearby words. Language models (LMs) for most open vocabulary ASR systems treat OOV words as a single entity, ignoring the linguistic information. In this paper we pre...

متن کامل

Finding Variants of Out-of-Vocabulary Words in Arabic

Transliteration of a word into another language often leads to multiple spellings. Unless an information retrieval system recognises different forms of transliterated words, a significant number of documents will be missed when users specify only one spelling variant. Using two different datasets, we evaluate several approaches to finding variants of foreign words in Arabic, and show that the l...

متن کامل

The Effect of Visually-Mediated Collocations on the Elementary EFL Learners’ Vocabulary Learning

When vocabulary teaching is taken into account in EFL classes in our Iranian state primary schools, teachers generally prefer to use classical techniques. The purpose of this study was to investigate the effect of visually-mediated collocations on the elementary EFL learners’ vocabulary learning. In order to conduct this study, 60 students from two classrooms in an elementary class, participate...

متن کامل

Task-Induced Involvement in L2 Vocabulary Learning: A Case for Listening Comprehension

The study aimed at investigating whether the retention of vocabulary acquired incidentally is dependent upon the amount of task-induced involvement. Immediate and delayed retention of twenty unfamiliar words was examined in three learning tasks( listening comprehension + group discussion, listening comprehension + dictionary checking + summary writing in L1, and listening comprehension + dictio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013